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ABSTRACT 


This thesis describes an off-line character recognition procedure, 
tested on the ten digits. The basic scheme is an attempt to recognize 
the character on the basis of a general memory scheme dependent on the 
overall shape of the entire character. Failing this, a system of checking 
two specific features, loops and spurs, 1s called into play on an as-needed 
basis. This idea of considering specific features only if a more general 
recognition procedure does not result in a single answer is unique in the 
character recognition field. 

Comparison to other systems indicates that the system is relatively 


successful, particularly in view of the reduced computer effort expended. 
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I. INTRODUCTION 


Since the mid-1950's, computerized character recognition has been a 
matter of interest to both the business and academic communities. The 
business applications initially required very highly controlled input, 
such as the E13B magnetic font on checks, but have been progressively 
widening their range of allowable inputs. The Postal Office's Zip Code 
perder 1S a good example of a recent successful Optical Character Reader 
[1]. The academic approaches have been more theoretical in nature and 
have concentrated on hand-printed characters. 

This thesis.presents a new approach to the recognition of hand-— 
printed numerals. It is argued that previous methods of recognition have 
been unnecessarily complex, due in part to a dissimilarity to human recog- 
nition procedures. It was the author's experience when first beginning 
to explore this field that these systems were difficult to comprehend, 
apply, or research. The procedure detailed here is simple in concept, 
more nearly imitative of ae conscious human process!, and comparatively 
simple to duplicate. It is felt that potential improvements to the system 
could raise its accuracy level to the high ninety's. Areas which appear 
most susceptible to improvement are pointed out below. In its present 
form, its success rate is 93%. These results were obtained using a set 
of characters which consisted of fifty examples of each of the ten digits. 


The five hundred test numerals were obtained from A.L. Knoll of Honeywel] 


‘the author's concept of the conscious human method of character 
recognition is explained in section II. 








Industries [2], and consist of characters presented on a twenty-one by 


twenty-five binary matrix. 


characters is shown in figure 1. 
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Fig. 1B-"7" 


This thesis is presented in three parts. 


basic ideas. 
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head pt 
Jad fret food pad front 
bo pt ps 


A sample of three successfully recognized 


First, a statement of the 


Second, five other specific systems are outlined. 


description of the present system is given in part three. 


A detailed 








II. BASIC IDEAS 


The basic idea of the proposed system was gleaned from a conversation 
between the author and his wife. Asked how she recognized characters, her 
reply was, "I don't think about it, I know them." Given the hypothetical 
Situation that a character was drawn so as to be confusing, she theorized 
that each character had specific features which aided recognition. The 
scheme of this system is: 1) An attempt to recognize the character with 
a fast and simple procedure, 2) In case of confusion, i.e., an inability 
to distinguish between two or more possible values for a character, a 
determination based on specific features. It remained to find a procedure 
that would accomplish step one, as the typical system uses only step two, 
1.e., a list of Specific features is obtained prior to attempting to 
recognize the character. 

A survey of material on the methods of teaching character recognition 
to children [3, 4, and 5] also indicated that this idea is a possible 
description of the human process of character recognition. In every 
case, learning of the characters is taught through repetition and memo- 
rization of the overall appearance of the character as it appears within 
a word. The most specific instruction suggestion is in [2], when, in 
the case of frequently confused letters, memory devices such as the 
following are offered: 

tice) Se0 
b is on the line 
b is tal] like a building 
Db looks to the righ 
In other words, current teaching practice is to have children memorize 


the characters' appearance and, in case of confusion, look for szecitic 








appropriate features. The latter is of particular importance when asking 
the children to reproduce characters in their own hand [6]. 

The proposed system immediately knows most numerals based on a simple 
memory scheme involving two very general parameters. These two parameters 
were originally suggested by the method described in [7] and below. They 
are the row and column vectors and are dependent upon the overall configu- 
ration and structure of a character. These parameters are easily determined 
and correct recognition was possible, using only these parameters, in /3.8% 
of the cases. The decision to rely heavily on this limited amount of 
information was based initially on a hand-simulation of the parameters on 
a set of carefully drawn characters. It was observed that even when the 
drawn characters were allowed to deviate from the norm, at least one of 
the parameters was almost invariably equal, or nearly equal, the ideal. 


2 Pry - a4 (en es mae ~ -~ can tiicem, Tl, Leon Silent cam te al ai Tem. Han oe, to ”a rs ao 
nis it Was deauced that wnen use of the parameters could not deter- 


cr 


From 
mine a Single answer, it would enable the compilation of a narrowed list 
Ofepossibilities. 

The second step is checking for specific features, namely closed 
loops and spurs. Only those features which will help to differentiate 
between particular choices will be considered. If for instance, the row 
and column vectors narrowed the possibilities to "0" and "9," it would 
be superfluous to determine if the character has any closed loops. Nor 
would a human, once he had decided that a character was either a "0" or 
a "9," look at the loop. In this case, the feature which facilitates 
recognition for both the human and the machine is the presence of a Spur. 

Using this information as a basis, the proposed system consists of: 
1) a general memory scheme that will consider the character as a whole, 


and 2) a dynamic system of checking specific features. Throughout this 








paper, "dynamic" is used to mean "on an as-needed basis," j.e., certain 
steps will be performed only when a need to perform them has been | 
determined. As previously noted, only step one was necessary in almost 
80% of the cases. 

The consideration of the character as a whole consists of intersecting 
the character with a series of horizontal and vertical lines and keeping 
track of how many lines each of the rays intersects. Figure 2 illustrates 


this process. 
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2A. Horizontal Scan Vector 2B. Vertical S@e@n VYocteor 


The resulting vectors are then compressed by eliminating successive 
duplications and al] zeroes. In this way, the horizontal scan vector of 
figure 2A becomes 2-10-1]1, while the vertical vector is 10-1-10-1. These 
two vectors can now be compared with the vector pairs in the memory and 
the answer "4" obtained. 

In some cases the vectors obtained from a character are not adequate 
matches with any of the standard sets. No answer results, but the choice 
iS narrowed, usually to two or three possibilities, on the basis of close- 
ness of fit. The closeness of fit is determined by a least squares system 


which is explained in Section IV of this paper. 








TIT CURRE Nits Satis 


Among five current systems chosen for comparison in this thesis, 
none use the basic approach just described. The five were chosen to be 
included because of their wide divergence and relative success. 

The procedure presently in use by the Japanese Postal Department is 
based on a horizontal scan technique far more complicated than simply 
counting the lines crossed by each matrix row [7]. A three by three 
window moves to the right one column at a time, highlighting each 
successive portion of the character. The pattern within the window is 
categorized, the information stored, and the window shifted. At the 
completion of each horizontal scan the window is moved back to the left 
edae, down two rows, and again beains shifting to the right. The 
extracted features are identified as belonging to a particular character 
iy utilizing a set of sequential decision diagrams. Due to the divergent 
ways of legitimately drawing any given number, several transition diagrams 
are used for each number. 

The authors do not give any statistical results, but only state that: 

",.. Three models have been installed in Tokyo and Osaka 
Central Post Offices and constantly field. tested to 
achieve high performance levels...° 
It can be reasonably assumed that their recognition percentage must be 
at least in the mid-nineties. 

Munson's system for character recognition was developed and tested 
on a forty-six character alphabet, rather than the ten character set of 
digits used by the other systems [8]. Because of the increased possible 


choices and the number of characters which are difficult even for a human 








to differentiate between out of context, e.g., "C" and "(," the longer 

the alphabet, the more difficult the problem. In view of this, the fact 
that Munson's procedure is the most complicated is not surprising. If the 
proposed system were to attempt the same forty-six character alphabet, the 
percentage of successful recognition would surely drop, but not, it is felt, 
Significantly below Munson's. The idea of considering context when 
recognizing a character, a procedure which greatly eases the human 
recognition problem, is not utilized by any of the systems discussed in 

this thesis, but is detailed in [8]. 

Two exhaustive procedures are used to extract feature characteristics 
of the character under consideration. The first of these, PREP, is "... a 
Simulation of : previously constructed oouien preprocessor capable of 
extracting, in parallel, 1024 optical correlations between a character 
image and a set of photoaranhic templates, or masks." The second, TOPO, 
",.. extracted a large number of topological and geometric features of 
the character image." 

PREP's output is nine 84-bit feature vectors. Each feature vector 
describes the location and orientation of edges in each of 84 regions. 

The nine vectors are the result of presenting each image to PREP nine 

| different times, "... Ppee Ai the center of the 24X24 field, then in 

the eight positions formed by translating it vertically and/or horizontally 
by two bits." 

TOPO's output is 68 features, "... 16 for the spurs, 16 for the 
concavities, 8 for the enclosures, 6 for overall character size and shape, 
and 22 resulting from special calculations about the width of the character 
at various levels, discontinuities in the profiles, etc." The first step 
in TOPO, the construction of a perimeter around, and adjacent to, the 


character is used in the system which is the subject of this thesis. 
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Utilizing the information produced by TOPO and PREP to recognize a 
character is the job of a piecewise linear type learning machine, of the 
type described by Nilsson. 

Munson's results, shown in Appendix A, indicate that both TOPO and 
PREP are necessary to obtain the very good recognition percentage of 85%. 
This percentage is particularly noteworthy in view of the length of the 
alphabet with which it was obtained. 

The system using "characteristic loci," devised by Glucksman and 
improved by Knoll bears a greater resemblance to the present system, 
although the basic approach is still quite different. The characteristic 
loci contain five numbers and are formed in the following manner. 
Starting with a given point in the binary array, the first number of 
the locus is set equal to the value of the point. The point will, of 
course, be a one if it 1s a part of the character and a zero if it is not. 
Rays are then assumed emanating out from the point to the left, right, 
“up and down. The second value of the locus is set equal to the number 
of lines crossed by the ray, extended to the left. The third value 
represents the number of lines crossed by the ray extending straight up, 
and so on in a clockwise direction. The maximum number of lines crossed 
is set at two. . Knoll found that sixteen or less characteristic loci were 
Pee escany to define a digit when the alphabet consisted of only the ten 
numerals. 

The "characteristic loci" features are utilized in two separate 
recognition schemes. One is an "exact match" while the other is a 
linear discriminate function scheme. The linear discriminate function 


scheme is a standard recognition procedure which involves the scalar 
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product of the characteristic vector determined for the character being 
tested and the standard vector. Knoll's results with the Honeywell 
numeric data set are the best of any system described in this paper. 
Working with Munson's SRI data, Knoll was able to obtain nearly equal 
results to Munson, when working with just the numerical characters or 
just the alphabetic characters. This was prior to Munson's adding PREP 
to TOPO, however. Knoll's results are included in Appendix A. The 98.9% 
success rate of Knoll's system is significantly better than the 93% 
achieved by the proposed system in its present form. 

The three methods described above all work with characters which have 
been transferred by hardware from their natural drawn state into a binary 
Matrix. A Pocoduns which works with a character "as is," is described 
by Greanias, et al [10]. By use of a logic control raster, Greanias and 
misecoWiecues were vable te censtruct a system which "... recognized 99.3% 
of numerals written by 45 subjects after thirty minutes of training.” 
There is a degree of dynamic parameter checking in that internal regions 
are only investigated if a "0," "1," "6," "8," or "9" is indicated as a 
possibility. The utilization of this additional logic is necessary every 
time one of these numbers is recognized, eater than on an as-needed basis 
to clear up confusion. | 

All of the above methods, and the authors, are off-line, i.e. the 
drawing of the character and its recognition by the machine are separated 
by any amount of time that is convenient. One recent on-line attempt at 
character recognition is that of Powers [11]. As he points out, the main 
characteristic difference between off-line and on-line recognition, which 
is the computer knowing the time sequence of the strokes, is a mixed 


blessing. Knowing in which sequence the lines were drawn would make 


12 








differentiating between characters generally easier, except that if a 
person were to draw a zero clockwise when the machine only expected 
counter-clockwise zeroes, the machine would fail to recognize the zero, 
geometrically perfect or not. 

Powers worked with the slopes of successive line segments. His task 
was made more difficult by the fact that he considered only the single 
parameter of direction sequence. His best percentage of recognition was 
92.8%:and this was obtained when he was the only subject inputting 
figures. He admits that 


. natural ambiguities between the direction 
sequence descriptions of different characters 
are resolved by conditioning the user." 

His results Brau Avena ae in Appendix A. 

All of the systems outlined above share one characteristic. The 
character is completely analyzed, a!}] parameters are determined, before 
any final decision making is attempted. In this they are very unlike 
conscious human behavior. 

Appendix A shows that the least complicated system, Knoll's, is 
Superior to Powers’, and apparently on a par with the far more complex 
system of Munson. A definite statement cannot be made until all systems 
have attempted the forty-six chupaeten alphabet that Munson‘s system was 
applied to. If a dynamic parameter: determination system could be 
incorporated in any of the above systems, the amount of computation, 


at least, would be reduced. 
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IV. THE PROPOSEDES ete 


As with all previous methods discussed, the author's procedure 
involves (1) establishment of standards, (2) preprocessing of the 
character being analyzed, and (3) determination of the value, or name, 
of the character. The proposed standards are few in number and easy to 
understand. 

The five hundred test numerals were drawn from the handwriting of 
thirteen authors. A copy of the character set will remain with Prof. 
Gibbons at the United States Navy Postgraduate School. 

The primary standards, used in the analysis of every character, are 
the vertical and horizontal vectors, explained previously. As was seen 
-for the figure "4," the vertical vector was 10-1-10-1, while the 
horizontal! vector was 2-10-11. An area of initial difficulty was 
determining the vectors for figures with curves. If, for instance, a 
zero were perfectly Sata and drawn with a very thin line, the horizontal 
and vertical vectors would both be 1-2-1. In reality however, curves are 
always flat enough and lines thick enough so that tne vectors are, in fact, 
10-2-10 for a zero. Some numbers may require two sets of vectors. This 
is caused by the fact that in writing numbers with curves, some people 


follow through more than others. Figure 3 illustrates this possibility. 
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The difference between the two possible horizontal vectors is very 
Significant. 

The first additional standard, considered incases where the primary 
system fails to determine a single name for the character, is the number 
of closed loops contained in the figure. This standard is exactly as 
would be expected. Eight is given a value of two for its two closed 
loops; six, nine and zero, a value of 13; and all others, a value of 0. 

The last standard is the property of having a spur, or loose end, in 
a particular quadrant. This does not mean that the character must be 
carefully centered on the matrix but that the character is always 
considered as if it were split into four quadrants reading counter- 
clockwise from the upper left. For instance, when scanning from the 
left to the right and down, if the first line that is encountered is a 
spur, such as with a "2," then the upper left standard has a value of 1. 
If, however, the first line encountered is not a Spur, such as with a "9," 
then the upper left standard is 0. Reading counter-clockwise, starting 
at the upper left, the standards for "3" are 1-1-0-0 while for "0," they 
are 0-0-0-0. 

In the present version of the system, all the above. standards are pre- 
determined and remain static throughout. This is not, of course, mandatory. 
In an earlier version of this basic idea, using only the vertical and 
horizontal vectors, implemented on an SDS 9300 with an AGT/1 graphics 
display, the standards were determined dynamically. With a simple 
averaging function it could be extended to improve itself and thus 
become tuned to a particular user's handwriting. 

The preprocessing performed on the characters is minimal. It may be 


considered as compensatory for the inherent problems in reconstructine 2 


es 








smooth, hand-drawn character on a twenty-five by twenty-one binary matrix. 
Gaps and irregular lines naturally appear. The scheme of smoothing 
employed has two steps. First is to widen every line by causing any 
matrix values immediately to the right and immediately below each "1" in 
the original matrix to become "1's" also. After this has been done, a 
second pass through the matrix is made, eliminating obvious irregularities. 
The procedure is simple. Any time a "I" or a "O" is found bordered on 
three sides by two or more of the opposite value, the odd one is changed 
to agree with its surrounding values. Figure 4 illustrates this filling 


and smoothing process. 


111700000 11110000 11110000 
11100000 111170000 11110000 
10000000 11100000 11110000 
11110000 11111000 11110000 
11100000 11110000 11110000 
11100000 11110000 11110000 


4A. Original portion 4B. After filling. 4C. After filling 
of a line. and smoothing. 


This process can be thought of as performing for the machine the compensa- 
tions that are automatically performed with the human eye when ignoring 
minor irregularities. A weakness is that the smoothing process applies 
only to horizontal and vertical lines, not slanted anes. | 

After the character has been preprocessed, the extended horizontal and 
vertical vectors of the character are determined. This is accomplished 
by a straightforward row-by-row and then column-by-column scan of the 
matrix. Whenever less than seven consecutive ones are encountered, 
followed by a zero, a counter is incremented. When each row or column 
1s PeTplecely scanned, the value of the counter, which represents the 


number of lines crossed, is stored and the counter is reset to zero. If 








seven or more consecutive ones are found, the value of the counter jis set 
at 10 to indicate the presence of a straight line. In this way any line 
which is longer than 1/3 of the width of the character, or 3/10 of the 
height is considered a straight line. It was arbitrarily decided not 

to allow any counter to exceed a value of 10, even if both a straight 
line and an intersected line were found in a particular row or column. 
While this caused no apparent discrepancies or problems, it might be an 
area for investigation. 

The next step involves compacting the extended vectors, with one entry 
for each row or column, into the same form as the standard vectors. The 
extended vectors are inspected sequentially with the following rules in 
effect: (1) All zeroes are discarded. (2) Any value except a ten which 
is unequal to both its immediate predecessor and successor is discarded. 
(3) Any nondiscarded value Prick ts unequal te the last value inserted 
into the compacted vector, is added to the compacted vector. For instance, 
iieeche beginning of an extended vector were 0-0-1-2-2-2-10-2-2, the rules 
would be applied in the following manner: 

1) The first two zeroes are discarded. 

2) The one is discarded since it is unequal to both its predecessor, 

zero, and its successor, two. 

3) The first two remains, causing a value of two to be placed in 

the menace vector. 

4) The next two two's are discarded since the last value added to 

the compact vector was a two. 

5) The ten is added to the compact vector. 

6) The next two is added to the compact vector while the one 


following is discarded. 


ee 








Figure 5 illustrates this process. 


5A. Extended vector 5B. Compact vector 

The process of determination can now begin. Two values for each digit 
are found by comparing the newly established compact vectors with the 
Standard vectors for each of the ten numerals. These values are the 
row difference and the column difference. The first number in the com- 
pact row vector is subtracted from the first number of the standard row 
vector and the difference is squared. The same is done with the second 
numbers in each row vector and the squared difference is added to that 
already determined. This continues through the entire set of vectors. 
The procedure is then repeated for the column vectors. The idea is of 
course to have a perfectly formed character that results in a row 
difference and column difference both equal to zero. Of the five 
hundred numerals tested, 247 of them were a perfect match. 

When the match is less than perfect, two variations are employed. 
They are "shifting" and choosing a single answer from several lists. 
These operations, which are explained in detail below, are performed 
before secondary standards, i.e. spurs and loops,-are used. The 


secondary standards were necessary in only 17.2% of the characters testec. 
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"Shifting" is performed by discarding the first value in each compact 
vector and moving all succeeding values forward one place. An example of 
when this operation is necessary follows. If a person draws a four with 
the left vertical stroke significantly higher than the right one, the 
resultant row vector is 1]1-2-10-1, instead of the stardard 2-10-1]. The 
row difference would be computed as (1-2)¢ + (2-10) + (10-1)¢ + (1-0)¢ 
or 147, definitely not a match. Yet there is no confusion to the human 
eye when a character is drawn in this manner. The only requirement is 
that the character look more 1ike one numeral than any other. To aid the 
computer in making this kind of distinction, shifting is introduced. 

After shifting, the compact row vector of 1-2-10-1 becomes 2-10-1, yielding 
a row difference of 0. 

Before and after shifting, several lists are maintained while se- 
quencing through the standard vectors. Use of these lists is the second 
variation necessary when the match is not perfect. Comparisons of row 
and column differences determine which group of standard characters wil] 
Make up any given list, and no one test character will ever use more than 
one of the lists. The particular lists employed were established througn 
experience. Several additional lists were included in initial attempts. 
As work progressed, however, it became obvious that a smaller number of 
lists was required for an even higher degree of accuracy. 

When matching a character's compact row and column vectors against the 
standard vectors, if the row difference is determined to be zero while the 
column difference is one, the standard character is added to the "Ziplist" 
as one possibility for later consideration. The exclusion from the 
Ziplist of characters with a column difference of zero and a row difference 


ae 
| t 


of one was based on experience. Some column vectors are stancerd for Tore 
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than one character, e.g. the standard column vector for "9" and "0" is 
10-2-10 and for "5," "6," and "8" is 10-3-10. The standard row vectors 
are, however, unique. Whenever either the row difference or the column 
difference is zero, the standard character is added to the "Zerolist." 
This is the second list (Ziplist being the first) checked for a possible 
answer. When either the row or column difference is between 0 and 10, 
the character is placed in the "Lowlist." Differences between 10 and 100 
qualify the character for the "Goodlist." 

After the new row and column differences are determined, various lists 
are compiled again as appropriate. If either difference is equal to zero, 
the "Zerolistsh" (the "sh" suffix indicating post-shift) is added to. If 
either difference is between 0 and 10, “Lowlistsh" increases in length by 
one. A difference between 10 and 100 places the standard character on the 
Geeaiist Cnce again. 

An entirely new list, "Trylist," is also developed. Requirements for 
the Trylist are that either the sum of the pre-shift row difference and 
the post-shift column difference, or the sum of the pre-shift column 
difference and the post-shift row difference, be less than three. The 
Trylist 1s sometimes the third list to be checked for a possible answer. 

Choosing an answer from the lists is done in the following sequence: 

1) If only one character has been placed in the Ziplist, it is the 

answer. 

2) If only one character has been placed in the Zerolist, it is the 

answer. 

3) If more than one character has been placed in the Zerolist, the 


‘Trylist is checked. 
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4) If the Trylist has but one member, it is the answer. 

5) If no answer has yet been found, then Ziplist, Zerolist, 
Zerclistsh, and Trylist are checked successively to determine 
if any have multiple entries. The first one found with multiple 
entries is used. The entries will be further evaluated by use 
of the additional standard values, loops and Spurs. 

6) Finally, if necessary, Lowlist, Lowlistsh, and Goodlist are 
checked in a similar fashion to determine a set of assole 
choices. 

Of the 500 numerals tested, 394 required only the use of row and 
column vectors to determine a single correct answer. In 20 additional 
cases, a Single answer was produced, but was incorrect. Eighty-six 
required additional standards. Of these, the correct digit was among 
the choices determined in 79 cases. 

In some cases, where the detection of a loop or a spur iS necessary, a 
preparatory procedure is required.¢ A perimeter of "2's" is constructed 
around the outer edge of the character. An array is simultaneously built 
containing the matrix coordinates of the two's, in the order in which 
they are positioned. Figure 6 shows the results of this procedure in a 


modified example. 


00000 02220 Zn SIS, 
01110 21112 San 4,4 
01010 21012 4,2 395 
00100 02120 Dist fag 
01000 21200 6,2 1,4 
00100 02120 UES: es) 
00000 00200 6,4 ec 


6A. Before perimeter 6B. After perimeter 6C. Sequential perimeter 
coordinates 


“The basic idea for both this procedure and the subsequent s1stem 
used in ascertaining the presence of spurs are described in |7.. 
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The coordinates are used in the search for possible spurs. The "2's" are 
necessary for determining the presence of closed loops. This determina- 
tion is made by scanning successive columns. A sequence of 2-1-0-1-2, 
disregarding successive duplications, indicates a closed loop. Appendix B 
contains an example of a character in its original form, after it has been 
filled and smoothed, and then after having a perimeter of "2's" constructed 
around it. 

The character can now be checked for loops and spurs as found necessary. 
As explained briefly above, the use of the closed loop property is straight- 
forward. It is not necessary to determine the presence of any closed loops 
in the test character unless the fact will aid in discrininating between 
choices. If, for instance, the choices determined by the row and column 
vectors all have one loop (i.e., "0," "6," "9"), the presence of any loops 
in the test character is not even investigated. Similarly. if no loops 
are present in any of the choices, a closed loop check is not performed. 
Only in situations in which not all the choices have the same number of 
loops 1s a closed loop check of value. When the number of closed loops 
in the test character in question is determined, all choices that do not 
have the same number of closed loops are discarded. If one choice remains, 
1t is chosen as the answer. If more than one choice remains, the survivors 
are checked for spurs. In rare cases, due to an earlier malfunction, no 
choices remain. The list of possibilities is then reinstated and passed 
Oni. 

A similar decision logic is employed when checking for spurs. Beginning 
with the upper left corner and proceeding counter-clockwise, the standards 
of the remaining possibilities are compared. If all remaining choices 


have, for instance, ones as their upper left standard, i.e., they all 
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have upper left spurs, then the upper left characteristic of the character 
being analyzed is not even determined. Only when there is a possible 
decision is that facet of the character checked. If, for instance, the 
choice is between "2" and "3," the upper left would not be checked but a 
discriminating distinction could be made based on the lower left. In 

this case, "3" nas a spur in the lower left, and "2" does not. After each 
check for a spur, the remaining possibilities are re-evaluated. 

As with closed loops, when one choice remains, it is the answer. If 
more than one choice remains, they are checked for the next possible 
discriminating spur. If none remain, the list of choices is reinstated 
and passed on to the next possible spur check. 

The idea of reinstating a Jist compares with a human reaction, "But 
. it must be one of these." If checking for loops and spurs fails to result 
in a sinale answer. a nonrecognition message is printed. This has never 
occurred. 

Results of this system of character analysis are tabulated in Appendix 
A. The table is constructed to indicate major areas of success and weak- 
ness. Many errors attributed to initial decisions or to incorrect choices 
are in fact caused by the filling and smoothing routines. Sone dHES 
important features are obscured or loops are filled in. However, the 
filling and smoothing procedures aided accurate character recognition 
far more than it hindered it. This was determined through a number of 
tests conducted without the filling and smoothing procedures, during 
which overall accuracy fell below 90%. | 


A flow diagram of the proposed system is given in Appendix C. 


23 








V. CONCLUSION 


A study of Appendix A reveals that the proposed system compares favor- 
ably with the other systems described, and exceeds some other systems not 
described. If accuracy obtained is considered in relation to the amount 
of computer effort expended or to the complexity involved, the results are 
Significant. The fact that the system is imitative of human recognition 
procedures and determines parameters dynamically makes it unique. 

Possibly the most important feature of the author's system is the ease 
with which it could be taught. Since the parameters are few in number 
and easy to understand, an instructor éould outline with little dif i veuiliey, 
this method of approach to a class just beginning to explore artificial 
intelligence in general or character recognition in particular. Given a 
System which is relatively uncomplicated to program, yields a high initial 
success rate and has some obvious areas for improvement, a student's 
interest could be caught and held. 

When the two weak areas, filling and smoothing of the character, and 
location of the spurs are perfected, it is also possible that this system 
would have practical application due to the greatly reduced programming, 
both hard-wired and soft-ware. 


In view of the success of the proposed system, it is felt that future 


attempts at character recognition should thoroughly investigate comparatively 


uncomplicated schemes with dynamic determination of parameters, to obtain 


maximum results. 
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APPENDIX A 


System Results 


Correct 
Investigator and Description Recog- Rejects Errors 
nition (%) (%) 
(Jo 
Case B1: Highleyman (11]—Numeric Data 
Training and testing on original 500 98.8 2 1.0 
samples 
Case B2: Training on original 500 
Testing on 120 new samples 61.6 19.2 19.2 
Case C1: Duda and Fossum {12]—Numeric 
Data 
Training on 500 samples 100.0 a 0.0 
Testing on same samples 
Case C2: Training on 400 samples 74.0 — 26.0 


Testing on remaining 100 
Case D1: Chow [13]—.: Alphanumeric Data 
(~1800 samples) ; 
Training on 80 percent of saniples 55.3 — 41.7 
Testing on remaining 20 percent 
Case EJ: Bledsoe ne Alphanumeric data 
Training on 80 percent of samples 40.0 — 60.0 
Testing on remaining 20 percent 
Case Fl: Munson, Duda, and Hart [6J— 
Alphanumeric Data Nearest-Neighbor 
Rule 
Based on 80 percent of samples o245 —_ Cs. 
Testing on remaining 20 percent 
Case Fe. Preprocessing and Piecewise 
Linear Classification 
Training on 80 percent of samples 68.3 — hays 
Testing: on remaining 2U percent 
Case #3: Same as Case F2, except numeric 
data only 
(S00 samples) 88.0 _— 10 
Case GI; Human Recognition {6J—Alpha- 
numeric Data 
360 samples 84.3 _ eye) 
Averages over ten people 


1]. Previous Results, Obtained from Ref. 2 


Correct 
Description Kecoz- Rejects Terrors 
nition (o) Ce 
(%) 
Case Al: Numeric Samples Only* 
394 “good” samples 87.6 6.3 6.1 
10S “bad” samples DNS, 228 Dono 
All 499 samples $27 053 16.5 
Case 2: Upper Case Alphabetics Only** 
854 “good” saiples 78.8 10.9 10.3 
442 “bad” Sa I 200) 39.3 apy 
All 1296 samples 61.8 20.6 17.6 
Case AS: Combined recognition results on 
all 1793 samples Gael 17.6 Li 


* An Ambiguity resolving procedure has been used. 
** Correct recognition includes 122 ambiguities that contain the 
correct symbol eee as one choice. Methods for resolving ambiguities 
have not been included in the simulation. 


2. Performance of Knoll's System 
With Standard Highleyman Data, Ref. 2 
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response 





inputs lie 2) SaaS Gyo aa 
1 j1l 
Z 8 jie 8 2 
3 10 ee 1 
4 10 cE ee 28 1 
5 3) a 9 Z 
6 dealt ee: 
7 | 6 ii = 6106 z 
8 ibik 11 a 
2) 10 PO aeEG 
0 10; 10 10 
totals Irs) SEG 78. 8 
percentages 92.8% D2 


Powers’ Results from-Ref. 11 


N. = Number of Samples of Character Used 
A. = Number Recognized Correctly 

M. = Number Mis-recognized 

X. = Number of Nonrecognition Errors 
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Correct 
Single Choice Choices Contain Answer 


Perfect Match from Lists Correct Choice Chosen Total 
Right Wrong Right Wrong Yes No Right Wrong 
48 2S ec 0 0 0 50 —Ss«O 
2 ne 301 6 0 6 49] 
io )6|OC(OO 19] 20 0 17 462~Ci«S 
4 21 #0 wy 2 0 49] 
= ee je 23 3 21 43 7 
mis =O ie W 0 W 45 5 
e7Ct«O 13.0 0 oOo 0 50 0 
ni2 0 a1 4 9 4 7 40 10 
cs )|COt« 9 0 5 0 5 50 —Ss«OO 
0 40 3 0 4 3 0 3 43.07 
Totals 247. 3 147.0417 79 7 71 465 35 
Percent 49.42 29.4% 14.2% 93% 


6. Results of Proposed System Using Standard Honeywell Data 
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APPENDIX B 


Example Character 


OOOCOOGCOUOCOC OC COOOCeeGeeaeceeelTe 
COGCCOOOC OOCVCCOC OCC eCOer CSS 
OCOCUCCOOCCOOCC OC OCC eae eS] | 
ODOCCOCOCCCE DOOCCC OC 22a oe 
ODCODsetVOCVODGOMODOCCDOC GQoocG 
OCIS Og CODO OPA MAC BDUMOOWI OC 
ODdOVWO AaAAGIDOO MAAR OOC OC O 
OOF CGOrtAAGOV eddie MOUTOCOO 
OCC OO SAHOO SdH OOUO 
OOOO AAA DAA dtridsssa aA aOCIO 
CODOICAAADO ARMA DIDODA GAARA CCO 
CROC CsA O Crim OVE CWRraetAnAOUCO 
DOO OAAADO FAA ON sD ODNGS 
eau Ge OG 0 a Re a De De ee eee eae | 8 Ok Oe a ee ee Ce es LO PLO 
COG OD AA AA aA AMA D DC ARR AAA ROC OG 
CUCOOD MAHA AriatAAO DAA AAOOOUO 
C]DOCC ts rliet AAA aOO AAA DODNCD 
QOCDONAA sds AAV O RSA ABADI OCG 
CODCOD Mss AA HAO Oa ARAM OTOOD 
CODDVO Passes aD ORAM OCSCO 
SCOOOCCOCVOOCDOCOOTC oC CCcotwolae 
WOOOVCODDICOcCeOoOC SI CoOeaacoCcecas 
WOOSDWOCCOCDOGOCVOCOCHOCUOUCY 


OCOCOCODODSOCCCOCOCGC CeCe 2Ce 2£ C2 Ca. 
WO, GO INTO OG TC as a ee 
COOOVUC WOE COOOCOCIVOOOOS OccCcCoD 
OQOMDBIOOCCDOCOOCC OOO ya eae Ss 
OC CGOODOOOS SSCL 35C COCOCOSGOGOS 
COOCCOOARDOVOCCOOT OSCGOOOCONDE GOD 
ODBWOOCAAOCOCC CODA AAMOC ANS OCOGDdD 
OG DAIHQIOS dada COdtaVooue esd 

OBCaIdAsaAGOoC Me Ka cee de a oC Che 
CDOT CO AAD OVC RAR SRR DOO OO 
SO OIISASO OS AAVIOCO IW BO AQCCOO 
OL SCOC AAG OO ANGE COTA Oana OON OO 
DOODGMAADOGAATQAODVDOO FAA AACTOOCO 
OCC GVAdAODONnAACCODNA AAA ADOOOMNM 
OC DOCAHAG OC AA AAC MASA AAIOOSC Oo 
OOMOOAAO CAH AGOOn AAAI OOCOOG 
OC OCODAA AAA AAO OU AA ARAQDQOOODOD 
OOOO SpA AASHTO OD AAA C COC DY 
DOGO Cd AAA DOOR AAA DOUOS 
COCCVOAAGnN AAAI C OUTS Oe DO 
OODIBWOCVBAOOOCVOOTVAOOGCOOOCCO 
COoODOCOVUCCOQVOOUVUUS Coe eaOooUC aaa] 
MOCGrmCCOOCSCOODOCUOeCooacacwoao. 


ter Filling ana 
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ec 


ite) 
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Origit 
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MODSDOIODIA|ISO* IOODDDOO99N9N99SOCO 


QOGaIONNSSC DOD oDICOO FDO BO QOOG 700 
OODDNAANQWVODICONNNNOOG DUCOOOVU 
Se See ee eee aS c, 
2 


QDOVOBNGAANNAAANOCONNA Ae SsANCOO 
OOGIN ss NGeAAN INGA AANO IDO 
(QIDON Agi dts AaNONddd aA ANGOO 
CODON Add adNNAaastAAAANCOO 
WIDVOIN sg ddgnindddsANNdddssaANQCoOOooO 
DOOON ARAN dagddaANOoco 
= eS) aha ge apa ae at a em 
OCD 3S0ONMtAAASI AANA Ne gives) <>) 
ODDOVOIAAAA AA AAN QNAAVBAANTOGO 
ODOSDINANANANNNNQWOONNNNUNDOODO 
WOOD DODGVIOO00ODO JOVOVDOGDAQOOO 
DODOOU DSO IOSD eOeCC OS eo OUe ea IO 


After Perimeter 


oe 


“ae 








APPENDIX C 


Flow Diagram 


Read in standard 
values 


Read in character 


Fill and smooth 
character, Determine 
extended vectors, 
Compact vectors, 
Compare to standard 
vectors 








Perfect match? Print answer 
No 


Compile lists, shift, 
Compile lists 


Single answer from | 
lists? ; 


No 
Check lists for 
choices 


would checking loops Yes, Check loops,| 
help? Narrow list 


Print nonrecognition No Would checking any No Single Answer? 
message 7 Spurs help? 


Check appropriate 
spur, Narrow list 


> — 





Single answer? 
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